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Abstract 

In recent years, there has been an increasing 
awareness to traffic localization techniques driven 
by the problematic of hotspot offloading solutions, 
the emergence of heterogeneous networks (HetNet) 
with small cells’ deployment and the green 
networks. The localization of traffic hotspots with 
a high accuracy is indeed of great interest to know 
how the congested zones can be offloaded, where 
small cells should be deployed and how they can be 
managed for sleep mode concept. We propose, in 
this paper, a new hotspot localization technique 
based on the direct exploitation of five Key 
Performance Indicators (KPIs) extracted from the 
Operation and Maintenance (0<^M) database of 
the network. These KPIs are the Timing Advance 
(TA), the angle of arrival (AoA), the neighboring 
cell level, the load time and two mean throughputs: 
arithmetic (AMT) and harmonic (HMT). The 
combined use of these KPIs, projected over a 
coverage map, yields a promising localization 
precision and can be further optimized by 
exploiting commercial data on potential hotspots. 
This solution can be implemented in the network 
at an appreciable low cost when compared with 
widely used probing methods. 
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1 Introduction 

Localization of traffic hotspots is often one of the 
first steps in network planning and optimization, es¬ 
pecially in the context of newly proposed technologies 
within 3GPP standards such as Heterogeneous Net¬ 
works (HetNets) [2]. HetNets are composed of small 
cells which are to be deployed, in addition to macro 
cells, in areas of capacity bottlenecks representing typ¬ 
ical hotspots. And so, the efficiency of the deployed 
solution to absorb traffic in the congested zone obvi¬ 
ously depends on the accuracy of the localization of 
the traffic hotspot zones. Several traffic localization 
techniques have been proposed in 3GPP Long Term 
Evolution (LTE) systems. They are mostly based on 
probing and include trace analysis and protocol decod¬ 
ing [3-5]. The most important information extracted 
from traces is the received power level and the tim¬ 
ing advance. Authors in [3] provided a test transmit¬ 
ter which plays the role of a neighboring cell at first 
and then the role of a serving Base Station (BS). Tests 
are realized within the existing cells in the network in 
order to assess the traffic density within the vicinity 
of the transmitter means. This solution needs to con¬ 
figure the test transmitter and realize measurements 
in each area separately. This method may take a long 
time to assess the traffic distribution in the entire zone 
covered by a BS. In the same context, patent [4] dis¬ 
closed a method where the User Equipments (UEs) 
send periodically a report of radio measurements from 
the serving cell and the neighboring cells. A record¬ 
ing unit post is installed at the interface between the 
BS and the studied Base Station Gontroller (BSG), 
termed A-bis, to examine the messages exchanged on 
this interface. Based on these measurements, the traf¬ 
fic distribution is calculated. It was shown that the 
precision using this method does not fit with the small 
cell dimensions. Another alternative method was pre¬ 
sented in patent [5] and treated traffic hotspot local¬ 
ization in GSM using statistical analysis of the timing 
advance and neighboring cell measurements extracted 
from traces. The accuracy of this method is improved 
as compared to the previous ones but it is still insuffi¬ 
cient because it only localizes the number of UEs and 
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omit the data volume. Furthermore, proximity location 
is another method of localization based on the detec¬ 
tion of close Wi-Fi Access Points (APs). This helps to 
calculate the position of the UE knowing that of the 
AP [6,9]. RF-Fingerprinting [7,9], cell-ID (LTE Rel 8) 
and A-GPS (LTE Rel 9) [9] are also well known tech¬ 
niques for locating individual UEs with different accu¬ 
racy levels. However, these individual locating meth¬ 
ods are quite complex (in the generation of a spatial 
traffic distribution) because it involves a large number 
of UEs, from which information about hotspot traffic 
distribution is extracted. 

In sum, the probing-based methods suffer from several 
shortcomings: Eirst, not all the traces are captured 
by probes, some of them are lost. Second, they re¬ 
quire high capacity storage servers that can not be pro¬ 
vided by existing Operation and Maintenance (O&M) 
databases. Third, based on recorded traces, the posi¬ 
tion of each UE is calculated separately leading thus 
to a heavy localization process. Eventually, the tools 
needed for probing, storage and processing are costly. 

Another family of localization methods makes use 
of Key Performance Indicators (KPIs) in order to in¬ 
fer the level of traffic in each area of the cell. In [8], 
authors proposed to divide each cell of the network 
into several subareas, and to calculate a traffic value 
for each subarea from O&M measurements in the net¬ 
work. The indicators used to compute traffic values are 
the load of the cell, the call attempts and the number 
of handovers. This method is very simple and does 
not require additional tests or equipment. However, 
it is limited in high data rate networks, such as 4G 
systems, because it only localizes traffic hotspots in 
terms of number of UEs. The idea of [8] can be further 
improved considering other KPIs. Besides, the traffic 
localization, in the way described in the paper, is based 
on the definition of an optimization problem where the 
number of variables to find is the number of subareas. 
In this case, the method has a significant computa¬ 
tional cost mainly for a precision going to small cell 
dimensions. 

In order to improve the method used in [8] with 
fast computations, we propose, in this paper, a new 
method for hotspot localization based on the combined 
use of several KPIs directly obtained from the O&M. 
These KPIs are: Timing Advance (TA), Angle of Ar¬ 
rival (AoA), neighboring cell level, load time and two 
mean throughputs: arithmetic (AMT) and harmonic 
(HMT). Simulation results show that the proposed 
method in this paper achieves acceptable localization 
along with sufficient precision and significant savings 
on the cost of localization, as compared to probing- 
based techniques. 


The main contributions of this paper are threefold. 
The first main contribution is the use of AoA, the cor¬ 
relation between the neighboring cells’ traffic loads and 
the difference between the AMT and HMT, in addi¬ 
tion to the traditionally used TA and neighboring cell 
level and the projection of the information extracted 
from these KPIs over the coverage map. The second 
novelty of this paper is the definition of a global met¬ 
ric combining all the five, previously-cited KPIs. This 
metric is further optimized using additional informa¬ 
tion about potential hotspot zones obtained from com¬ 
mercial data. Third, smoothing the estimated map is 
an additional step proposed in this paper in order to 
make the estimated traffic distribution more precise. 

The remainder of this paper is organized as follows. 
A general description of the process of traffic localiza¬ 
tion is provided and the key motivations behind the 
use of the KPIs are given in Section H. Then, the three 
main inputs of the hotspot localization algorithm are 
detailed in section HI. In Section IV, we present the 
hotspot localization algorithm and its optimization. 
Section V contains simulations and corresponding re¬ 
sults. Sections VI eventually concludes the paper. 

2 Traffic localization method: Process and 
key motivations for using KPIs 

The proposed algorithm of hotspot localization is de¬ 
picted in Eig. 1. Eive KPIs define the first main inputs 
of the algorithm: TA, AoA, neighboring cell level, cor¬ 
relation between traffic loads and the difference be¬ 
tween the AMT and HMT. Extracting information 
from these 5 KPIs and projecting them over the cov¬ 
erage map, as will be explained later in Section 4, en¬ 
ables us to obtain the traffic level in each pixel^^l of 
the coverage map. 

The 5 KPIs are properly chosen with respecting two 
important criteria: locating hotspots in terms of num¬ 
ber of connected UEs and traffic volume. In fact, TA, 
AoA and neighboring cell level can be sufficient for 
calculating the spatial distribution of connected UEs. 
However, in practice, a hotspot is measured by intro¬ 
ducing not only the high density of connected UEs 
but also the traffic volume. Actually, this latter does 
not follow exactly the same evolution as the number 
of connected UEs. Therefore, the traffic hotspot lo¬ 
calization is improved by adding directly data-volume 
related KPIs such as the correlation between the cells’ 
traffic load and the difference between the AMT and 
HMT. 

[^]The coverage map of the network is divided into 
small areas called pixels. The size of each pixel defines 
the resolution of the coverage map (often between 25 
to 50 meters in practice). 
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Figure 1: General process of hotspot localization. 


The importance of each KPI is measured and an im¬ 
portance factor is assigned to it in order to avoid the 
over-confidence in the localization of traffic hotspots 
due to the correlation between KPIs. Optimal impor¬ 
tance factors are found by solving a least square prob¬ 
lem minimizing the distance between the estimated 
map and the map of potential hotspots. The map of 
potential hotspots represents additional information 
about potential hotspot zones obtained from commer¬ 
cial data. Then, we define a global metric combining all 
the five, previously-cited KPIs and using the optimal 
importance factors. The localization of traffic hotspot 
zones can give significant weights to isolated areas, or 
pixels, as well as pixels in the edge of a hotspot. And 
so, smoothing the estimated map is an additional step 
proposed in this paper in order to make the estimated 
traffic distribution more accurate. 

In this paper, we are mainly interested on the com¬ 
bination of the O&M KPIs, projecting them over the 
coverage map and smoothing the estimated distribu¬ 
tion. However, the map of potential hotspots does not 
play a crucial role in this method of hotspot local¬ 
ization. In fact, it is possible to manually tune the 
importance factors until having a good accuracy of lo¬ 
calization. Whereas, a possible additional step to the 
localization algorithm is expected to provide better re¬ 
sults. This step is simply projecting O&M KPIs over 
an available map of potential hotspots. 

It is possible to add other KPIs in the algorithm of 
traffic localization. But, in such cases, it is essential 


to properly combine the KPIs in order to avoid the 
over-confidence due to correlated KPIs. 

3 INPUTS FOR TRAFFIC 
LOCALIZATION 

3.1 Coverage map: Definition and notation 

A coverage map is a map of a bounded geographical 
area where the signal level from each cell is given in 
each pixel. The coverage map is often taken from cover¬ 
age prediction tools or from real measurements such as 
coverage provided by the Minimization of Drive Tests 
(MDT) techniques [10]. In this paper, the coverage 
map is denoted by C and constitutes a bounded part 
of M?. Without loss of generality, we assume that C 
has the form of a square and is discretized into equally 
pixels. 

For 1 < i, j < m, we designate by Pij the coordi¬ 
nates of each pixel in C and we assume that the pixel 
Po,o is located at the lower left corner of the map (the 
origin of M^). 

We denote hy C = {Ci, C 2 ,Cn} the set of the 
cells located in the coverage map £, where Ck is the 
geographical area covered by cell k and n is the number 
of cells. It is clear that (3 C £ because some pixels are 
not covered by any cell. 

Let RSRPk,ij be the Reference Signal Received 
Power (or its equivalent the RSCP in WCDMA and 
Rxiev in GSM networks) [11] from cell k in pixel Pij^ 
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transmitting BS 



Figure 2: Best server map in the covered area. 


suitable TA for the UE [12]. Then, from this TA, the 
BS calculates the distance traveled by the radio signal. 
In practice, depending on the resolution (or granular¬ 
ity) of TA, a specific distance range where the UE is 
located will be calculated. In practice, TA is used as 
a KPI for network supervision and analysis, with a 
granularity of 78.25 meters in LTE networks [12]. 



Eigure 3: The division of the covered area based 
on TA. 


then the cell coverage Ck is given by 
For 1 < k < n, Ck = 

{Pij G jC such that RSRPk,ij = max RSRPi^ij} 

’ ’ ’ l<l<n ’ ’ 

( 1 ) 

Eor every pixel Pij in £, we denote by c*^- and Cij re¬ 
spectively the index of the first and second best serving 
cell. 


max RSRPi i j 

l<l<n ’ 

(2) 

max RSRPi i j 

l<l<n ’ 

(3) 


Eig. 2 illustrates an example of the coverage map with 
the best serving cell in each pixel. Colors are used to 
distinguish the area covered by each sector in the net¬ 
work. Each site has an identifier (BS01,BS02..) and 
sectors’ identifier in each site is either A or B or C 
since we are working in a trisect or ized network. 

Note that in practice even if each pixel has a best 
serving cell, a UE located indeed in this pixel is ac¬ 
cepted in the network only if the signal level received 
from its serving cell is higher than a certain threshold, 
called Qrxievmin in 3GPP LTE standard. 

3.2 Used Key Performance Indicators (KPIs) 

3.2.1 KPIl: Timing Advance 

Timing Advance (TA) (in GSM, LTE and LTE-A) or 
the propagation delay (in WCDMA) is a time offset 
realized by the BS between its own transmission and 
the transmission received from the UE. According to 
the calculated offset, the serving BS determines the 


Based on this granularity, TA is discretized into 6 
intervals indexed by t of the form [78.25 x t, 78.25 x 
(t + I)], with 0 < t < 4, and [391.25, + oo) for t = 5. 
As illustrated in Eig. 3, each cell is divided into sev¬ 
eral intervals according to the above defined ranges of 
distances. The TA KPI that we obtain from the O&M 
database is the distribution of the distance from the 
BS to the UEs in the cell. Eor example, 30% of the 
UEs are in the range of t = 0, 20% in the range of 
t = I, 40% in the range of of t = 2, 10% in the range 
of of t = 3 and 0% in the range of t = 4 and t = 5. 

Eor each cell /c, we denote by Tt{k) the value of the 
TA KPI that gives the percentage of UE in the TA 
interval t. 

3.2.2 KPI2: Angle of Arrival 

Angle of Arrival (AoA) is defined as the estimated 
angle of a UE with respect to a reference direction, 
typically the geographical North. The value of AoA 
is positive in an anticlockwise direction [II]. In gen¬ 
eral, any uplink signal from the UE can be used to 
estimate the AoA, but typically a known signal such 
as the Sounding Reference Signals (SRS) or DeModu- 
lation Reference Signals (DMRS) would be used [13]. 
The serving BS determines the direction of arrival by 
measuring the TA at individual elements of the an¬ 
tenna array and thus from these delays, the AoA is 
calculated. 

In order to construct the shape of the downlink 
beam, direction of arrival estimation already exists 
and is supported in WGDMA but it is not standard¬ 
ized [14]. Experiments in [19-21] showed that the accu¬ 
racy of AoA estimation depends mainly on the num¬ 
ber of antenna elements and also on the separation 
distance between them. It varies from small devia¬ 
tions (around 2° to 5°) to significant deviations (about 
30° = I) [19-21]. 
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Based on the possible AoA deviations provided in 
[19-21], we assume that the cell is divided into 3 zones 
as in Fig. 4 relative to the angle between the UE, the 
BS and the geographical North. Likewise the TA, we 
denote by t = —1,0 or 1 the index of each AoA zone. 

For each cell /c, we denote by 0t(/c) the value of the 
AoA KPI that gives the percentage of UE in the AoA 
zone t. Therefore, the percentage of UEs connected to 
cell k and which are in the range of [—I, |-] is denoted 
00 (^)- Then, 0i(A:) represents the percentage of UEs 
connected to cell k and which are in an angle of arrival 
larger than Also, 0_i(/c) represents the percentage 
of UEs connected to cell k and that are in an angle of 
arrival less than — In this sectorization, we consider 
that the angle of arrival is equal to zero when pixel 
Pij has the same angle as the azimuthof its serving 
sector. 

An example of distribution that we can get is 30% 
of UEs with range corresponding to t = —1, 40% for 
the range of AoA t = 0 and 30% for t = 1. 

3.2.3 KPI3: Neighboring cell level 

In different events of UEs connected to the network 
such as handover process, every UE measures the sig¬ 
nals of the detected cells and sends a report of these 
measurements to its serving cell. In GSM, the detected 
neighboring cells list is reported periodically by the 
UE to the BS. However, in 3G and 4G networks, it is 
reported to the BS only when a special event is trig¬ 
gered. The counter in the O&M database represent¬ 
ing the neighboring cell level of a given neighboring 
cell is incremented whenever that cell is received (in a 
measurement report) as an eligible candidate cell for 
handover (Eig. 5). In order to get the KPI as a distri¬ 
bution, the neighboring cell level of each neighboring 
cell is calculated with respect to the total number of 
reported neighboring cells. 

For each neighboring cell / of the serving cell /c, we 
denote by 'di{k) the relative number of times where cell 

[2] The azimuth is the angle between the direction of 
maximum antenna radiation with respect to the geo¬ 
graphical North. 



Table I: Neighboring cell level for the cell BSIA in Fig. 

2 . 


Cell ID 

BS3A 

1 BS7B 

1 BS3C 

BS6C 

BS2C 

BS2A 

1 BS3B 

BS4A 

Neighboring level 

0.2478 

0.1371 

0.1364 

0.1298 

0.1038 

0.1007 

0.0759 

0.0685 


/ is reported as eligible candidate cell for handover in 
a measurement report to the serving cell k. 

Normalizing with the total number of reported 
neighboring cells makes 'd/(/c) a density distribution 
KPI, i.e. 

E = 1 (4) 

leSk 

Sk is the set of the neighboring cells of the cell k. Table 
I shows an example of this KPI for the given serving 
cell BSIA (refer to Fig. 2). 

3 . 2.4 KPI 4 : Load time 

The load time represents the percentage of time when 
the cell’s resources are fully occupied. This KPI is a 
gauge counter (i.e. a percentage of time) already de¬ 
fined and provided by all O&M vendors. It is calcu¬ 
lated per hour and can be given for the busy hours, 
for the day or whether for the week or the month. In 
this paper, we denote by p{k) the load time of cell k. 

3.2.5 KPI5: Arithmetic and Harmonic Mean 
throughputs 

The Arithmetic Mean Throughput (AMT) is the mean 
of the throughputs of all connected UEs. On the other 
side, the Harmonic Mean Throughput (HMT) is the 
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average of the inverses of the throughputs . The HMT 
differs from the AMT by the fact that it gives more im¬ 
portance to throughputs of UEs with bad radio con¬ 
ditions. Based on the formula of the cell throughput 
defined in the O&M database by most equipment ven¬ 
dors, it is important to notice that the cell throughput 
has almost the same value as the HMT. 

The utility of these KPIs comes from the following 
possible scenarios: if HMT is high but always lower 
than AMT, one can infer that most of the users might 
be in bad radio conditions and if the arithmetic mean 
throughput is high, most of the users should be close 
to the serving cell. In other words, if most of the users 
are in the cell edge, the HMT becomes more signifi¬ 
cant and the AMT is decreased and becomes close to 
the HMT. However, if most of the users are in good 
radio conditions, the difference between the HMT and 
the AMT becomes significant. Moreover, in the case 
of uniformly distributed traffic in the cell between the 
edge and the center, the difference between the arith¬ 
metic and the harmonic mean throughput is also large 
but less than the case where UEs are mostly concen¬ 
trated in the center. 

Eor each cell /c, we denote by and /J.h{k) are 

respectively the AMT and the HMT KPIs taken from 
O&M database. 


3.3 Map of potential traffic hotspots 

A map of potential traffic hotspots is an additional 
layer obtained from commercial data so as to improve 
the hotspot localization. In this case, the spatial traf¬ 
fic distribution is calculated with taking into consid¬ 
eration the a priori knowledge of the location of in¬ 
dustrial and commercial zones or the location of resi¬ 
dences. This map includes also informations about the 
location of rivers, forests etc.. Such a map represents a 
reference map to define the contribution of each KPI 
in the estimation of the spatial traffic distribution like 
described in subsection 4.2. 

We define Q = {qi,j)i<i,j<m as the matrix represent¬ 
ing the potential hotspots as follows 


= 



if Pij is in a potential hotspot 
otherwise 


(5) 


ujij represents the importance of the potential hotspot. 
In fact, the weight assigned to each hotspot is evalu¬ 
ated according to the heaviness of the traffic that can 
be carried inside it. Eor example, a residential zone 
cannot take the same weight as a very large commer¬ 
cial zone. 

[^^HMT is always lower than AMT according to the 
mathematical definition of the harmonic mean and the 
arithmetic mean. 


4 TRAFFIC LOCALIZATION 
ALGORITHM: DESIGN AND 
OPTIMIZATION 

4.1 Description of the algorithm 

Before running the localization algorithm, the O&M 
reports the statistics of each KPI corresponding to the 
cells in its controlled area. Then, the following steps 
are performed. 


4 . 1.1 Step 1 : calculate the spatial distribution of 
traffic weights according to TA. 

Eor the design of the step based on TA, we attribute to 
each pixel Pij a traffic weight . To do this, each cell 
in the coverage map is divided into 6 zones depending 
on the distance from the BS as illustrated in Eig. 3. 
Then, each pixel takes a weight equal to the percentage 
of UEs in its range of TA. 

^ (6) 

where Xi,t{Pi,j) E the indicator function of TA zone 
t in cell c*j; Xi,t{Pi,j) takes the value 1 if the pixel 
belongs to TA zone t of its serving cell and 0 otherwise. 

4 . 1.2 Step 2: calculate the spatial distribution of 
traffic weights according to AoA. 

In this step, each cell in the coverage map is divided 
into 3 zones as illustrated in Eig. 4. So, each pixel in the 
same range of AoA takes a traffic weight q\J equal to 
the percentage of connected UEs in this range of AoA. 

( 7 ) 

t=o 

Likewise, x^ffiPi.j) is the indicator function of AoA 
zone t in cell c*^-; X2,t{Pi,j) is given by 




with 


1 if Angle{Pij,c*j) - e{c*j) G It 
0 otherwise 

( 8 ) 


It = < 


L 6 ’ 6 J 

[f> A 

-f] 


if t=0 
if t=l 
if t=-l 


(9) 


and Anglc{Pi^j^clj)) is the angle between pixel Pij 
and its serving cell with respect to the geographical 
North of the coverage map. is the azimuth of 

the antenna of cell c* •. 

f'lj 
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4 . 1.3 Step 3: calculate the distribution of traffic 

weights according to the neighboring cell level. 



Input from coverage map • 


= Cl 

= C 2 


Coverage map 


O&M 


Figure 6: Exemple of attributing traffic weights 
based on Neighboring cell level. 


Exploiting the neighboring cell level degree is moti¬ 
vated by the fact that when a neighboring cell is re¬ 
ported many times rather than others, the pixels hav¬ 
ing this cell as the second best serving cell contain 
probably most of the traffic. 

(3) 

We attribute to each pixel a traffic weight q^J equal 
to the value of the neighboring cell level of its second 
best serving cell like shown in Eig. 6. 




( 10 ) 


4-1-4 4- calculate the distribution of traffic 

weights according to cell loads. 

In hotspot zones, the traffic evolution has almost the 
same behavior in the neighboring cells due to the load 
balancing and forced handovers [15]. As a consequence, 
if a cell is congested and one of its neighbors is con¬ 
gested, it is likely that most of the traffic is located in 
the edge between the two cells and if there is no corre¬ 
lation with any of the neighboring cells, then traffic is 
most probably generated from users close to the serv¬ 
ing cell. Eurthermore, if two cells are congested but 
the heavy traffic is not located between them, the at¬ 
tributed traffic weight in the region between these two 
cells will be significant. The next step provides correc¬ 
tions to this step with comparing the arithmetic mean 
throughput to the arithmetic mean throughput of the 
cell. 


We define fjc* . as the set of neighboring cells having 
the same behavior in terms of load time as the serving 
cell c*^- and are eligible for RSRP-based handover in 
Pij. It is given by 

'f’c* ■ = {k ^ C such that |p(c*j) — p{k)\ < e 
and \RSRPc* .^ij — RSRPk^ij\ < A} (11) 

where A is a parameter used to identify the set of can¬ 
didate cells for a possible handover (by default, it is 
equal to the handover margin) and e is used in order 
to get only the cells having the same behavior as the 
serving cell (e is set to 0.1 in the localization exercise 
shown, next, in the simulation section). 

The traffic weight attributed to each pixel based 
on the load KPI is then given as follows 

g(4) ^ . (12) 

I 0 otherwise 


n('0) means the number of elements in the set '0, p is a 
threshold used to identify the very loaded serving cells 
(in practice the congestion threshold is set to 70%). 
Thus, only pixels, belonging to congested cells, get a 
traffic weights according to this KPI. 

4 . 1.5 Step 5: calculate the distribution of traffic 
weights according to AMT versus HMT. 
Attributed traffic weights are the difference between 
the AMT of the cell and the HMT divided by a con¬ 
stant po in order to have all the calculated weights 
in the range [0, 1]. The constant po is the maximum 
throughput that a user can get. The weight in each 
pixel depends on its position relative to the position 
of its best serving cell and also on the value of the 
difference between the two mean throughputs. So, the 
weights for pixels in the cell center take the value of 
the difference between the HMT and the AMT divided 
by po. The rest of the pixels in the cell takes a traf¬ 
fic weight equal to the complementary of this value as 
given in (13). 

('5') 

The traffic weight q] - attributed to each pixel based 
on the HMT and the AMT is then formulated as fol¬ 
lows: 


( 5 ) 




1 - 


Mo 

Ma(c* j) 


if RSRPc* i . > RSRPq 
if RSRPf.* i 7 < RSRPq 

(13) 


where RSRPq is a RSRP threshold used to differenti¬ 
ate between the cell edge versus cell center UEs. 
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4 . 1.6 Step 6: include all KPIs in one global metric 
and estimate the traffic hotspot zones. 

After analyzing all the KPIs and extracting infor¬ 
mation about hotspots from them, this step identi¬ 
fies the way to transpose all the outputs from the 
previous steps into a unique traffic weight Q = 
The simplest way to get the matrix 
Q is by making a weighted sum of the matrices 
< 5 < 5, with an importance 
factor X = (xi,X 2 ,X 3 ,X 4 ,depending on the im¬ 
portance of the information extracted from each KPL 
So, we have that 

5 

g = (14) 

S = 1 

The trivial value of x is (^, In this case, x 

is not optimal because some KPIs provides more pre¬ 
cision than others and some KPIs are correlated, as 
said before. For these reasons, we define the optimal 
value of X as the result of an optimization problem 
which reduces the distance between the map of poten¬ 
tial hotspots and the matrix Q. This optimization is 
detailed in next subsection 4.2. 


4 . 1.7 Step 7: smoothing of the estimated spatial 
distribution of the traffic. 

The estimated spatial traffic distribution is further en¬ 
hanced with smoothing which consists in combining 
the values estimated in each pixel with the values esti¬ 
mated in the neighboring pixels. The main advantage 
of smoothing [16] is to delete isolated pixels with sig¬ 
nificant weights (wrong estimated hotspots). From an¬ 
other side, this step allows to ensure more the existing 
hotspots in the network once they are not eliminated. 

It was shown in [17] that the spatial traffic dis¬ 
tribution follows a Lognormal distribution or a mix¬ 
ture of Lognormal distributions. Smoothing the esti¬ 
mation with a Lognormal smoother supposes the apri- 
ori knowledge of the direction of its shape relative to 
the angle coordinate. Therefore, we choose the func¬ 
tion of the smoother to be a Gaussian distance decay 
(symmetric with respect to the angle coordinate). 


q 


* 


^P-, -fee ,j')qi' 


(15) 


where 




1 


(16) 


number of the neighboring pixels that are involved into 
the smoothing of the value in pixel Pi^j (in simulations, 
we take h = pij are the elements of the 

weight matrix obtained from the previous step and qfj 
are the elements of the weight matrix estimated after 
smoothing. 


4.2 Optimization of the localization algorithm 

It is possible to manually set the importance factors 
but the relative performance of the localization process 
may be incompetent comparing to existing solutions. 
For instance, KPIs holding small information should 
not be well considered in the aggregated traffic weight 
Q because they bias the results. In order to improve 
the accuracy of the hotspot localization algorithm, the 
importance factors, as defined in step 6 , need to be 
optimized so that the contribution of each KPI in the 
weighted sum is justified. 

Finding the optimal vector x is the result of an op¬ 
timization problem that reduces the distance between 
the available map of the potential hotspots Q and the 
estimated traffic map Q found in Step 6 of the lo¬ 
calization algorithm. The optimization is performed 
only one time for an area that we well know its poten¬ 
tial hotspots. The obtained optimal importance fac¬ 
tor might be used later for any similar environment 
(urban, suburban, rural...) without any knowledge of 
its potential hotspots because the contribution of each 
KPI mostly depends on the type of environment. For 
instance, TA and AoA are more efficient and have more 
contribution in rural areas, however in urban areas, 
measuring the TA or the AoA presents more errors 
due to obstacles resulting thus to more reflexions and 
diffractions. 

The optimization of x can be formulated as follows: 


(minimizeY,Ti=iiqi,j - = 


ZZ=1 

Xs ^ s = 1..5 

(17) 


In order to put the optimization problem in a known 
and solvable form, we consider 


1 

qi,m ■ 

■■ 92,1 



■■ qi,m ■ 

■■ 92,1 

(5) 

qm,m 


b — [^1,1 •••^m,m] 


with \Pij—Pi'jf\ is the euclidean distance between the Clearly, A is a matrix of lines and 5 columns and 

pixels Pij and Prj' and h is a parameter related to the b is a vector of length 
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Using matrix notation, the optimization problem of 
equation (17) becomes a minimization of the distance 
between the vectors Ax and b in the space : 

minimize 11 Ax — blU 
Xs>0 , 1 < 5 < 5 

where ||.||2 is the standard 2-norm (the Euclidian 
norm) in the space . 

This formulation represents a least square optimiza¬ 
tion problem and can be solved using the Gauss New¬ 
ton method [18]. This method is fast and provides ac¬ 
curate solution. As said before, the optimal importance 
factors of this problem is not specific to limited number 
of possible scenarios. It can indeed be changed from a 
scenario to another but with small variations since the 
importance of each KPI remains more or less the same 
and is slightly independent from the taken scenarios. 
The optimal vector x would lead to a more precise 
hotspot localization that we look forward to in the ob¬ 
jective of the paper. 

5 NUMERICAL RESULTS 

5.1 Parameters’ settings 

In order to validate and evaluate the proposed algo¬ 
rithm, we use a LTE simulator that allows dynamic 
users’ arrivals and departures after being served. At 
each time step of the simulator, equal to Is, the num¬ 
ber of call attempts are generated according to a Pois¬ 
son distribution with intensity A = 200 UEs/s. UE po¬ 
sitions are generated depending on the traffic weight 
in each pixel of the coverage map. UEs are accepted 
in the network only when there are available resources 
and when their RSRP is higher than the threshold 
Qrxievmin (set to —115dBm in this work). We suppose 
that each UE has a file of size 1Mbit to download. Each 
UE quits the network after downloading the entire file. 
Moreover, we consider that a part of UEs (20%) are 
moving during their transmission with a mobility of 
S.33km/h. This means that the simulation supports 
also handover events with a margin set to 6 dB. UEs 
are scheduled according to the round robin model. At 
the end of the simulation, which lasts 1 hour, all KPIs 
(including the previously cited KPIs) are calculated 
and stored in a file. 

The simulated network is shown in Eig. 2 and com¬ 
posed of 23 tri-sectorized eNB (evolved Node-B) cov¬ 
ering an area of 3 x 3Km^. eNB positions and physical 
characteristics are taken from a real network. Each sec¬ 
tor in each eNB is equipped with a directive antenna 
having a beamwidth equal to 65^. Moreover, each sec¬ 
tor has an available capacity equivalent to 20Mhz of 


spectral bandwidth operating in the band 2.6Ghz. Be¬ 
cause of the network heterogeneity, the list of param¬ 
eters (e.g., eNB transmitter noise figure, cable loss, 
maximum transmit power...), involved in this simula¬ 
tion depends on each site. Hence, it is worthless to cite 
all these parameters. 

The coverage map (RSRP from each cell in each 
pixel), provided by a planning tool, has a resolution 
equal to 25 meters. According to this map, the first 
and the second best serving cell are identified in each 
pixel. 

Before starting the simulation, we attribute the traf¬ 
fic weight in each pixel of the coverage map based on 
a mixture of Log-normal distributions [17]. As shown 
in Eig.7, each peak of a Log-normal density represents 
a traffic hotspot. 

Eor the potential traffic map, we attribute arbi¬ 
trary weights in some pixels according to our knowl¬ 
edge about some hotspot zones in the chosen map. 
The exploitation of this map improves the precision of 
hotspot localization but recall again that it is not a 
key element in the proposed algorithm. 

The validation of the algorithm consists in finding 
the traffic weight in each pixel (following the steps of 
the algorithm) and comparing it with the original traf¬ 
fic weights generated at the beginning of the simula¬ 
tion. 

5.2 Results 

Running the hotspot localization algorithm and per¬ 
forming the optimization, as described in Subsection 
4.2, we get the importance factor 

X = [0.418 0.2689 0.2281 0.0358 0.0491]^ 

The validation of the obtained importance factors is 
not the objective of the present paper. However, we 
were able to test other values set for importance fac¬ 
tors and the relative results are clearly improved after 
the optimization step. So, these optimal importance 
factors are used in the performance evaluation for the 
different reasons cited before. 

In Eig. 7, the original distribution of the traffic gen¬ 
erated at the beginning of the simulation is drawn. In 
Eig. 8a, the estimated traffic distribution is depicted 
based on the proposed algorithm without including the 
smoothing process. Eig. 8b represents the spatial dis¬ 
tribution of the traffic after smoothing the calculated 
distribution. 

Regarding the identification of the hotspot zones, we 
observe that most of the hotspot zones are found in the 
entire network map. 

In Table 2, the coordinates of the generated hotspots 
and those estimated in step 6 and in step 7 are ex¬ 
tracted from the calculated matrices generating the 
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Table 2 : Coordinates of generated and estimated hotspots holding highest traffic weights (x,y) in meters. 


Generated hotspot 

(1100,960) 

(760,940) 

(20,980) 

(-1020,-640) 

(1040,280) 

(-200,-120) 

(-960,740) 

(220,100) 

(960,-300) 

Estimated hotspot in step 6 

(1140,940) 

(740,940) 

(40,980) 

(-1040,-660) 

(1200,420) 

(-240,-180) 

(-940,780) 

(320,140) 

(940,-280) 

Estimated hotspot in step 7 

(1100,960) 

(760,900) 

(40,980) 

(-1020,-660) 

(1100,340) 

(-180,-120) 

(-920,760) 

(260,100) 

(960,-320) 

Distance between the original and 
the estimated hotspot in step 6 

44.72 

20 

20 

28.28 

212.6 

72.11 

44.72 

107.7 

28.28 

Distance between the original and 
the estimated hotspot in step 7 

0 

40 

20 

20 

84.85 

20 

44.72 

40 

20 



figures above. From Table 2, we find out that the exe¬ 
cution of the proposed algorithm until the step 6 pin¬ 
points the hotspots generated in the network with a 
precision of 59.84 meters (see Table 2, the mean of the 
calculated distances between the original and the esti¬ 
mated hotspots in step 6 ). However, the spatial traf¬ 
fic distribution is quite uniform inside these hotspots. 
Then, the shape of the estimated traffic distribution is 
improved after smoothing and the precision becomes 
about 31 meters (see Table 2, the mean of the calcu¬ 
lated distances between the original and the estimated 
hotspots in step 7). 

Fig. 9 shows the CDF of traffic weights attributed 
to pixels based on the proposed algorithm. To localize 
the hotspot zones, we are interested only in the region 
surrounded with blue ellipse in Fig. 9 which represent 
the significant weights. 

Fig. 9 indicates that using all the 5 KPIs in the net¬ 
work gives better estimation of traffic distribution as 
compared to the case when using only some of them. In 
fact, significant weights have a small density when us¬ 
ing only one KPI which means that the traffic distribu¬ 
tion is more fiat and is uniform inside the hotspot zone. 
This density is increased and becomes near the exact 
distribution when all the KPIs are used and when all 
the steps of the proposed algorithm are performed. 

From Fig. 9, we note that the most useful KPIs are 
the TA, AoA and the neighbor cell level. The mean 
throughput and the load time have also an impact on 
the estimation and improve further the localization. 



(a) Estimated traffic distribution in step 6. 



Pi«8l inil8» I 


(b) Estimated traffic distribution in step 7. 
Figure 8 : Spatial traffic distribution. 


In addition, smoothing the estimated traffic distribu¬ 
tion reduces the difference between the original spatial 
distribution of the traffic and the estimated one. 

Among the plotted curves in Fig. 9, the curve repre¬ 
senting the combination between TA and neighboring 
cell level allows us to compare approximately the solu¬ 
tion proposed in [5] to the present proposed algorithm. 
So, the new proposed algorithm perform better and 
provides promising results comparing to the solution 
disclosed in [5]. 

We compare in Table 3 the percentage of detected 
hotspots that have the highest weights to the original 
highest hotspots in the network. 

In Table 3, the analysis is based on the fact that 
weights are sorted in a decreasing way. We determine 
the coordinates of the pixels which hold 0.5% (I, 2, 5, 
10, 20, and 50 respectively) of the traffic in the network 
(using the real distribution). Then, we calculate the 
sum of estimated weights in these pixels. 
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Empirical CDF 



Figure 9: CDF of real and estimated traffic 
weights. 


Table 3: Percentage of detected hotspots. 


Real % of first 
hotspots 

Estimated % with 
only TA 

Estimated % with 
all KPIs 

Estimated % after 
smoothing 

0.5 

0.12 

0.30 

0.422 

1 

0.26 

0.48 

0.58 

2 

0.46 

1.09 

1.13 

5 

1.12 

2.17 

2.56 

10 

2.29 

4.34 

4.8 

20 

4.87 

33.45 

9.7 

50 

14.22 

74.34 

27.32 


contrast, when the percentage of the first significant 
hotspots is high, pixels in the edge of the hotspots are 
also included in the evaluation. As a result, the sum 
of weights before smoothing is larger than the sum af¬ 
ter smoothing since weights of pixels in the edge of 
hotspots are more reduced after smoothing. 

6 Conclusion 

We have proposed in this work a modular and op¬ 
timized algorithm that consists in combining sev¬ 
eral KPIs along with coverage and potential traffic 
hotspots map. Results showed acceptable localization 
error, in cases of moderate and heavy traffic. There¬ 
fore, the use of O&M KPIs projected over a cover¬ 
age map can be efficient for hotspot localization as 
they yield promising results at low operational costs. 
This method is a good solution that can be used both 
to identify areas where a small cell must be deployed 
and to perform appropriate configurations to lessen the 
congestion rate in hotspot zones. 

In the future works, we intend to study the impact 
of imperfect hotspot localization on the network per¬ 
formance. This can be done by analyzing the extra 
interference that would result from a bad positioning 
of the small cell and the degraded throughput experi¬ 
enced by the involved UEs in the network. 


The purpose of the evaluation in Table 3 is to see 
the importance of using all the KPIs and the im¬ 
pact of smoothing on the localization of most signifi¬ 
cant hotspots. From another side, the column related 
to performances of using only TA stands to the rea¬ 
son that, in literature, several traffic localization tech¬ 
niques are based on TA. So, this allows a common 
comparison of these techniques to the proposed one. 
In this context. Table 3 shows that the hotspot local¬ 
ization is clearly improved when it is based on the 5 
KPIs comparing to the use of only TA. 

From Table 3, we also observe that the first hotspots 
that have the highest weights are reasonably estimated 
and this estimation is further enhanced with the step of 
smoothing. However, when we increase the percentage 
of the first significant hotspots (from 0.5% to 10%), the 
estimation shows less efficiency and this is due to the 
fact that some zones close to the hotspots could take 
a significant weight without carrying heavy traffic. 

Before smoothing, pixels within the same region of 
a hotspot take the same weight. However, weights in 
the center of a hotspot are increased after smooth¬ 
ing and weights of pixels in the edge of a hotspot are 
reduced. Hence, when the percentage of the first sig¬ 
nificant hotspots is low, only pixels in the center of a 
hotspot are evaluated and the sum of weights before 
smoothing is less than the sum after smoothing. In 
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